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Abstract 

We  describe  a  system  called  Tileworld,  which  con¬ 
sists  of  a  simulated  robot  agent  and  a  simulated 
environment  which  is  both  dynamic  and  unpre¬ 
dictable.  Both  the  agent  and  the  environment 
are  highly  parameterized,  enabling  one  to  control 
certain  characteristics  of  each.  We  can  thus  ex¬ 
perimentally  investigate  the  behavior  of  various 
meta-level  reasoning  strategies  by  tuning  the  pa¬ 
rameters  of  the  agent,  and  can  assess  the  success 
of  alternative  strategies  in  different  environments 
by  tuning  the  environmental  parameters.  Our  hy¬ 
pothesis  is  that  the  appropriateness  of  a  particular 
meta-level  reasoning  strategy  will  depend  in  large 
part  upon  the  characteristics  of  the  environment 
in  which  the  agent  incorporating  that  strate^  is 
situated.  We  describe  our  initial  experiments  us¬ 
ing  Tileworld,  in  which  we  have  been  evaluating  a 
version  of  the  meta-level  reasoning  strategy  pro¬ 
posed  in  earlier  work  by  one  of  the  authors  [Brat- 
man  ti  ai,  1988]. 

Introduction 

Recently  there  has  been  a  surge  of  interest  in  systems 
that  are  capable  of  intelligent  behavior  in  dynamic,  un¬ 
predictable  environments.  Because  agents  inevitably 
have  bounded  computational  resources,  their  delibera¬ 
tions  about  what  to  do  take  time,  and  so,  in  dynamic 
environments,  they  run  the  risk  that  things  will  change 
while  they  reason.  Indeed,  things  may  change  in  ways 
that  undermine  the  very  assumptions  upon  which  the 
reasoning  is  proceeding.  The  agent  may  begin  a  delib¬ 
eration  problem  with  a  particular  set  of  available  op¬ 
tions,  but,  in  a  dynamic  environment,  new  options  may 
arise,  and  formerly  existing  options  disappear,  during 
the  course  of  the  deliberation.  An  agent  that  blindly 
pushes  forward  with  the  original  deliberation  problem, 
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without  regard  to  the  amount  of  time  it  is  taking  or 
the  changes  meanwhile  going  on,  is  not  likely  to  make 
rational  decisions. 

One  solution  that  has  been  proposed  eliminates  ex¬ 
plicit  e.xecution-time  reasoning  by  compiling  into  the 
agent  all  decisions  about  what  to  do  in  particular 
situations  [Agre  and  Chapman,  1987,  Brooks,  1987, 
Kaelbling,  1988).  This  is  an  interesting  endeavor,  but 
its  ultimate  feasibility  for  complex  domains  remains  an 
open  question. 

An  alternative  is  to  design  agents  that  perform  ex¬ 
plicit  reasoning  at  e.xecution  time,  but  manage  that 
reasoning  by  engaging  in  meia-level  reasoning.  Within 
the  past  ferv  years,  researchers  in  AI  have  provided  tlie- 
oretical  analyses  of  meta-level  reasoning,  often  apply¬ 
ing  decision-theoretic  notions  to  it  [Boddy  and  Dean, 
1989,  Horvitz,  1987,  Russell  and  Wefald,  1969].  In  ad¬ 
dition,  architectural  specifications  for  agents  perform¬ 
ing  meta-level  reasoning  have  been  developed  [Brat- 
man  ct  ai,  1988],  and  prototype  systems  that  engage 
in  meta- level  reasoning  have  been  implemented  [Cohen 
ei  ai,  1989,  Georgeff  and  Ingrand,  1989].  The  project 
we  describe  in  this  paper  involves  the  implementation 
of  a  system  for  experimentally  evaluating  competing 
theoretical  and  architectural  proposals. 

More  specifically,  we  have  been  constructing  a  .sys¬ 
tem  called  Tileworld,  which  consists  of  a  simulated 
robot  agent  and  a  simulated  environment  which  Is  both 
dynamic  and  unpredictable.  Both  the  agent  and  the 
environment  are  highly  parameterized,  enabling  one  to 
control  certain  characteristics  of  each.  We  can  thus  ex¬ 
perimentally  investigate  tlie  behavior  of  various  meta¬ 
level  reasoning  strategies  by  tuning  the  parameters  of 
the  agent,  and  can  assess  tlie  success  of  alternative 
strategies  in’ different  environments  by  tuning  the  en¬ 
vironmental  parameters.  Our  hypothesis  is  that  the 
appropriateness  of  a  particular  meta-level  reasoning 
strategy  will  depend  in  large  part  upon  the  charac¬ 
teristics  of  the  environment  in  which  the  agent  incor¬ 
porating  that  strategy  is  situated.  We  shall  describe 
below  how  the  parameters  of  our  simulated  environ¬ 
ment  correspond  to  interesting  characteristics  of  real, 
dynamic  environments. 
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a  =  agent,  :fl:  =  obstacle,  T  =  iz'/e,  <  digits  >  =  hole 


Figure  1:  A  Typical  Tileworld  Starting  State 


In  our  initial  experiments  using  Tileworld,  we  have 
been  evaluating  a  version  of  the  meta-level  rccisoning 
strategy  proposed  in  earlier  work  by  one  of  the  authors 
[Bratman  ei  ai,  1988].  However,  the  Tileworld  can 
be  used  to  evaluate  a  range  of  competing  proposals, 
such  as  the  ones  mentioned  above;  agents  instantiating 
many  alternative  proposals  can  readily  be  imported 
into  the  Tileworld  environment. 

The  Tileworld  Environment 

The  Tileworld  is  a  chessboard-like  grid  on  which  there 
are  agents,  tiles,  obstacles,  and  holes.  An  agent  is  a 
unit  square  which  is  able  to  move  up,  down,  left,  or 
right,  one  cell  at  a  time,  and  can,  in  so  doing,  move 
tiles.  A  tile  is  a  unit  square  which  “slides”:  rows  of 
tiles  can  be  pushed  by  the  agent.  An  obstacle  is  a 
group  of  grid  cells  which  are  immovable.  A  hole  is  a 
group  of  grid  cells,  each  of  which  can  be  “filled  in”  by 
a  tile  when  the  tile  is  moved  on  top  of  the  hole  cell;  the 
tile  and  particular  hole  cell  disappear,  leaving  a  blank 
cell.  When  all  the  cells  in  a  hole  are  filled  in,  the  agent 
gets  points  for  filling  the  hole.  The  agent  knows  ahead 
of  time  how  valuable  the  hole  is;  its  overall  goal  is  to 
get  as  many  points  as  possible  by  filling  in  holes. 

Figure  1  depicts  a  typical  Tileworld  starting  state.  A 
Tileworld  simulation  takes  place  dynamically:  it  begins 
in  a  state  which  is  randomly  generated  by  the  simulator 
according  to  a  set  of  parameters,  and  changes  contin¬ 
ually  over  time.  Objects  (holes,  tiles,  and  obstacles) 
appear  and  disappear  at  rates  determined  by  param¬ 
eters  set  by  the  experimenter,  while  at  the  same  time 
the  agent  moves  around  and  pushes  tiles  into  holes. 
The  dynamic  aspect  of  a  Tileworld  simulation  distin¬ 
guishes  it  from  many  earlier  domains  that  have  been 


used  for  studying  AI  planning,  such  as  blocks-world. 

The  Tileworld  can  be  viewed  a  rough  abstraction  of 
the  Robot  Delivery  Domain,  in  which  a  mobile  robot 
roams  the  halls  of  an  office  delivering  messages  and  ob¬ 
jects  in  response  to  human  requests.  We  have  been  able 
to  draw  a  fairly  close  correspondence  between  the  two 
domains  (i.e.,  the  appearance  of  a  hole  corresponds  to 
a  request,  the  hole  itself  corresponds  to  a  delivery  loca¬ 
tion,  tiles  correspond  to  messages  or  objects,  the  agent 
to  the  robot,  the  grid  to  hallways,  and  the  simulator 
time  to  real  time). 

Features  of  the  domain  put  a  variety  of  demands  on 
the  agent.  Its  spatial  complexity  is  nontrivial:  a  sim¬ 
ple  hill-climbing  strategy  can  have  modest  success,  but 
when  efficient  action  is  needed,  more  extensive  reason¬ 
ing  is  necessary.  But  the  time  spent  in  reasoning  has  an 
associated  cost,  both  in  lost  opportunities  and  in  unex¬ 
pected  changes  to  the  world;  thus  the  agent  must  make 
trade-offs  between  speed  and  accuracy,  and  must  mon¬ 
itor  the  execution  of  its  plans  to  ensure  success.  Time 
pressures  also  become  significant  as  multiple  goals  vie 
for  the  agent’s  attention. 

Of  course,  a  single  Tileworld  simulation,  however  in¬ 
teresting,  will  give  only  one  data  point  in  the  design 
space  of  robot  agents.  To  explore  the  space  more  vigor¬ 
ously,  we  must  be  able  to  vary  the  challenges  that  the 
domain  presents  to  the  agent.  We  have  therefore  pa¬ 
rameterized  the  domain,  and  provided  “knobs”  which 
can  be  adjusted  to  set  the  values  of  those  parameters. 

The  knob  settings  control  the  evolution  of  a  Tile¬ 
world  simulation.  Some  of  the  knobs  were  alluded  to 
earlier,  for  instance,  those  that  control  the  frequency 
of  appearance  and  disappearance  of  each  object  type. 
Other  knobs  control  the  number  and  average  size  of 
each  object  type.  Still  other  knobs  are  used  to  control 
factors  such  as  the  shape  of  the  distribution  of  scores 
associated  with  holes,  or  the  choice  between  the  instan¬ 
taneous  disappearance  of  a  hole  and  a  slow  decrease  in 
value  (a  hard  bound  versus  a  soft  bound).  For  each  set 
of  parameter  settings,  an  agent  can  be  tested  on  tens 
or  hundreds  of  randomly  generated  runs  automatically. 
Agents  can  be  compared  by  running  them  on  the  same 
set  of  pseudo-random  worlds;  the  simulator  is  designed 
to  minimize  noise  and  preserve  fine  distinctions  in  per¬ 
formance. 

The  Tileworld  environment  is  intended  to  provide 
a  testbed  for  studying  a  wide  range  of  dynamic  do¬ 
mains  and  tasks  to  be  performed  in  them.  It  exhibits 
spatial  complexity,  a  central  feature  of  many  such  do¬ 
mains;  and  it  includes  tasks  of  varying  degrees  of  im¬ 
portance  and  difficulty.  It  is  generic:  although  we  have 
explored  connections  between  Tileworld  and  tasks  in¬ 
volving  robot  delivery,  Tileworld  is  not  tightly  coupled 
to  any  particular  application  domain,  but  instead  al¬ 
lows  an  experimenter  to  study  key  characteristics  of 
whatever  domain  he  or  she  is  interested  in,  by  varying 
parameter  settings.  For  example,  the  experimenter  can 
focus  on  domains  in  which  the  central  characteristic  is 


2 


a  wide  distribution  of  task  values  (simulated  in  Tile- 
world  by  hole  scores),  or  of  task  difficulty  (simulated 
by  hole  size).  In  this  regard,  Tileworld  differs  from 
the  Phoenix  simulator  [Cohen  ct  al.,  1989],  which  is 
more  closely  tied  to  a  particular  application.  Instead, 
the  goals  of  the  Tileworld  project  are  closer  to  those  of 
the  MICE  simulator  [Durfee  and  Montgomery,  1990). 
However,  Tileworld  is  a  more  highly  dynamic  environ¬ 
ment  than  MICE.  Also,  where  MICE  is  used  to  focus 
on  issues  of  real-time  inter-agent  coordination,  Tile¬ 
world  is  intended  as  a  framework  for  the  more  general 
investigation  of  intelligent  behavior  in  dynamic  envi¬ 
ronments. 

Using  Plans  to  Constrain  Reasoning 

The  agent  we  have  implemented  and  used  in  our  exper¬ 
iments  instantiates  IRMA — the  Intelligent  Resource- 
Bounded  Machine  Architecture  [Bratman  el  al.,  1988]. 
IRMA  builds  on  observations  made  by  Bratman  [Brat¬ 
man,  1987]  that  agents  who  are  situated  in  dynamic 
environments  benefit  from  having  plans  because  their 
plans  can  constrain  the  amount  of  subsequent  reason¬ 
ing  they  need  to  perform.  Two  constraining  roles  of 
plans  concern  us  here: 

•  An  agent’s  plans  focus  subsequent  means-end  rea¬ 
soning  so  that  the  agent  can,  in  general,  concentrate 
on  elaborating  its  existing  plans,  rather  than  on  com¬ 
puting  all  possible  courses  of  action  that  might  be 
undertaken, 

•  An  agent’s  plans  restrict  the  set  of  further  poten¬ 
tial  courses  of  action  to  which  it  needs  to  give  full 
consideration,  by  filtering  out  options  that  are  in¬ 
consistent  with  the  performance  of  what  the  agent 
already  plans  to  do. 

The  first  role  of  plans  has  always  been  at  least  implicit 
in  the  standard  models  of  Al  planning:  Al  planners 
compute  means  to  goals  that  the  agent  already  has. 
The  second  has  a  more  dramatic  effect  on  the  architec¬ 
ture  we  are  investigating:  it  leads  to  tiie  introduction  of 
a  filtering  mechanism,  which  manages  execution-time 
reasoning  by  restricting  deliberation,  in  general,  to  op¬ 
tions  that  are  compatible  with  the  performance  of  al¬ 
ready  intended  actions.  (To  have  the  desired  effect  of 
lessening  the  amount  of  reasoning  needed,  the  filter¬ 
ing  mechanism  must  be  computationally  inexpensive, 
relative  to  the  cost  of  deliberation.) 

Of  course,  a  rational  agent  cannot  always  remain 
committed  to  its  existing  plans.  Sometimes  plans  may 
be  subject  to  reconsideration  or  abandonment  in  light 
of  changes  in  belief.  But  if  an  agent  constantly  recon¬ 
siders  its  plans,  they  will  not  limit  deliberation  in  the 
way  they  need  to.  Thus,  an  agent’s  plans  should  be 
reasonably  stable. 

To  achieve  stability  while  at  the  same  time  allowing 
for  reconsideration  of  plans  when  necessary,  the  filter¬ 
ing  mechanism  should  have  two  components.  The  first 


checks  a  new  option  for  compatibility  with  the  exist¬ 
ing  plans.  The  second,  an  override  mechanism,  encodes 
the  conditions  under  which  some  portion  of  the  exist¬ 
ing  plans  is  to  be  suspended  and  weighed  against  some 
other  option.  The  filter  override  mechanism  operates 
in  parallel  with  tlie  compatibility  filter.  For  a  new 
option  to  pass  through  the  filter,  it  must  either  pass 
the  compatibility  check  or  else  trigger  an  override  by 
matching  one  of  the  conditions  in  the  override  mecha¬ 
nism.  A  critical  task  for  the  designer  of  an  IRM  A-agent 
is  to  construct  a  filter  override  mechanism  so  that  it 
embodies  the  right  degree  of  sensitivity  to  the  problems 
and  opportunities  of  the  agent’s  environment. 

The  options  that  pass  through  the  filter  are  subject 
to  deliberation.  The  deliberation  process  is  what  actu¬ 
ally  selects  the  actions  the  agent  will  form  intentions 
towards.  In  other  words,  it  is  the  deliberation  pro¬ 
cess  that  performs  the  type  of  decision-making  that 
is  the  focus  of  traditional  decision  theory.  The  filter¬ 
ing  mechanism  thus  serves  to  frame  particular  decision 
problems,  which  the  deliberation  process  then  solves. 

The  process  of  deliberation  is  different  from  means- 
ends  reasoning  in  our  view,  and  this  distinction  is 
worth  discussing  further.  As  we  see  it,  deliberation 
is  deciding  which  of  a  set  of  options  to  pursue,  wliile 
means-ends  reasoning  is  more  a  process  of  determining 
how  to  achieve  a  given  goal.  We  see  means-ends  rea¬ 
soning  producing  options  (candidate  plans  to  achieve 
a  goal),  which  can  then  be  the  subject  of  deliberation. 

This  may  be  a  surprising  distinction  to  those  famil¬ 
iar  with  the  standard  Al  planning  paradigm,  in  which 
the  job  of  a  planner  is  usually  to  produce  the  single 
best  plan  according  to  some  set  of  criteria.  Any  delib¬ 
eration  which  is  to  be  done  in  such  a  system  is  done 
by  the  planner,  and  it  might  be  argued  that  a  planner 
is  the  best  place  for  such  reasoning.  Certainly  some 
pruning  of  alternatives  must  be  done  by  a  planner; 
however,  there  are  reasons  to  believe  that  some  delib¬ 
eration  belongs  outside  the  planner.  In  some  situations 
it  is  appropriate  to  have  several  means-ends  reasoners 
with  differences  in  solution  quality  and  time  required; 
these  must  be  invoked  appropriately  and  a  single  so¬ 
lution  chosen.  In  other  circumstances  it  is  desirable 
to  engage  in  a  decision-theoretic  analysis  of  compet¬ 
ing  alternatives.  Consequently,  we  have  maintained 
the  distinction  between  deliberation  and  means-ends 
reasoning  in  our  system. 

The  Tileworld  Agent 

In  implementing  an  IRMA-agent  for  the  Tileworld,  we 
adopted  a  model  of  a  robot  with  two  sets  of  process¬ 
ing  hardware.  One  processor  executes  a  short  control 
cycle  (the  act  cycle),  acting  on  previously  formulated 
plans  and  monitoring  the  world  for  changes.  The  sec¬ 
ond  processor  executes  a  longer  cycle  (the  reasoning 
cycle),  which  permits  computations  with  lengths  of  up 
to  several  seconds. 

The  act  cycle  is  straightforward;  the  agent  performs 
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Figure  2:  Tileworld  Agent  Architecture 


those  acts  that  have  been  identified  during  the  pre¬ 
vious  reasoning  cycle,  monitoring  for  limited  kinds  of 
failures.  Perception  also  occurs  during  the  act  cycle: 
the  agent  can  access  a  global  map  of  the  world  that  in¬ 
dicates  the  locations  of  all  objects,  as  well  as  the  score 
and  time  remaining  to  timeout  for  all  holes. 

The  reasoning  cycle  makes  decisions  about  what 
goals  to  pursue  and  how  to  pursue  them.  The  por¬ 
tion  of  the  agent  architecture  that  controls  reasoning 
is  depicted  in  Figure  2.  Processing  is  aimed  at  main¬ 
taining  the  intention  structure,  a  time-ordered  set  of 
tree-structured  plans  that  represents  that  agent’s  cur¬ 
rent  intentions.  During  any  given  reasoning  cycle,  one 
of  two  things  can  happen: 

•  Potential  additions  to  the  intention  structure,  called 
options,  can  be  considered  by  the  filtering  and  delib¬ 
eration  processes.  These  options  can  come  from  two 
sources.  One,  the  agent  may  perceive  environmen¬ 
tal  changes  that  suggest  new  options — in  Tileworld, 
this  occurs  when  new  holes  or  tiles  appear.  Alterna¬ 
tively,  options  may  be  suggested  by  the  means-end 
reasoner. 

•  Means-ends  reasoning  can  be  performed  to  produce 
new  options  that  can  serve  as  means  to  current  in¬ 
tentions.  The  bulk  of  our  means-ends  reasoner  is  a 
special-purpose  route  planner. 

We  will  concentrate  here  on  the  filtering  and  deliber¬ 
ation  mechanisms.  All  options  are  in  principle  subject 
to  filtering  and  deliberation;  so  far,  however,  we  have 
confined  such  reasoning  to  top-level  options,  i.e.,  op¬ 
tions  to  fill  a  particular  hole. 

Recall  that  the  IRMA  filtering  mechanism  has  two 
parts:  the  compatibility  filter  and  the  filter  override. 
An  option  passes  the  filter  if  it  is  either  compatible  with 


all  existing  intentions,  or  if  it  triggers  an  override. 

Compatibility  checking  of  top-level  options,  as  im¬ 
plemented,  is  straightforward.  A  top-level  option  is 
either  to  fill  a  hole  now  or  later;  if  the  agent  already 
has  a  current  intention  to  fill  a  particular  hole  now, 
then  an  option  to  fill  some  other  hole  now  is  incompat¬ 
ible.  All  intentions  to  fill  a  hole  later  are  compatible 
with  each  other. 

The  filter  override  must  identify  options  that  are 
potentially  valuable  enough  that  they  warrant  delib¬ 
eration  even  if  they  fail  the  compatibility  test.  The 
simplest  override  mechanism  compares  the  score  of  a 
hole  being  considered  as  an  option  to  that  of  the  hole 
currently  being  filled.  If  the  difference  between  them 
equals  or  exceeds  some  threshold  value  v,  then  the  new 
option  passes  the  filter.  The  threshold  value  is  set  by 
a  Tileworld  parameter.  Sometimes  it  may  sensibly  be 
set  to  a  negative  value:  in  that  case,  a  new  option 
could  be  subject  to  deliberation  even  if  it  involved  fill¬ 
ing  a  hole  with  a  lower  score  than  the  hole  currently 
being  filled.  This  might  be  reasonable,  since  the  new 
hole  may,  for  instance,  be  much  easier  to  fill.  Setting 
the  tlireshold  value  to  — oo  results  in  all  options  being 
subject  to  deliberation. 

Recall  that  an  option’s  passing  the  filter  does  not 
lead  directly  to  its  introduction  into  the  intention 
structure:  instead,  it  is  passed  to  the  deliberation  pro¬ 
cess  for  more  detailed  consideration  and  comparison 
with  the  current  intention.  Deliberation  may  involve 
extensive  analysis;  deliberation  strategies  can  be  cho¬ 
sen  in  the  Tileworld  agent  by  the  setting  of  a  param¬ 
eter.  We  currently  have  implemented  two  deliberation 
strategies. 

The  simpler  deliberation  module  evaluates  compet¬ 
ing  top-level  options  by  selecting  the  one  with  the 
higher  score.  When  there  is  a  nonnegative  threshold 
value  in  the  filter,  this  mode  of  deliberation  always  se¬ 
lects  the  new  option;  with  a  negative  threshold  value, 
it  instead  always  maintains  the  current  intention.  This 
illustrates  a  general  point:  if  deliberation  is  extremely 
simple,  it  may  be  redundant  to  posit  separate  deliber¬ 
ation  and  filtering  processes. 

A  more  sophisticated  deliberation  strategy  computes 
the  likely  value  (LV)  of  a  top-level  goal.  LV  is  an  esti¬ 
mate  of  expected  utility,  combining  information  about 
reward  (score)  with  information  about  likelihood  of 
success.  For  a  given  option  to  fill  a  hole  h,  LV  is  com¬ 
puted  as 

Tv(h)  =  _ scorejli) _ 

dist(a,  li)  -f  2  +  disi{li,ii) 

where  score{li)  is  the  reward  for  filling  the  hole, 
(fist(o,  h)  is  the  distance  between  the  agent  and  the 
hole,  n  is  the  number  of  tiles  needed  to  fill  the  hole, 
and  dist{ti,ti)  is  the  distance  from  the  hole  to  the  i‘^ 
closest  tile.  The  factor  of  2  occurs  because  the  agent 
must  traverse  the  interval  in  both  directions,  i.e.,  it 
must  make  a  “round  trip”.  If  there  are  fewer  than  n 
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tiles  available,  LV{h)  is  zero. 

We  intend  to  design  additional  deliberation  modules, 
including  one  that  performs  complete  means-end  rea¬ 
soning  for  all  options  under  consideration  before  mak¬ 
ing  its  decision.  Such  a  deliberator  must  not  be  in¬ 
voked  carelessly;  we  expect  our  filtering  mechanism  to 
be  increasingly  useful  as  we  add  more  sophisticated 
and  time-consuming  deliberation  components. 

Preliminary  Experiments 
With  both  the  simulator  and  the  agent  in  place,  we  are 
in  a  position  to  conduct  experimental  studies  of  the  the 
performance  of  the  agent.  By  adjusting  the  Tileworld 
“knobs” ,  we  can  control  a  number  of  domain  character¬ 
istics.  We  can  vary  what  we  call  dynamism  (the  rate  at 
which  new  holes  appear),  hosiiliiy  (the  rate  at  which 
obstacles  appear),  variabilHy  of  uiilHy  (differences  in 
hole  scores),  variabilHy  of  difficulty  (differences  in  hole 
sizes  and  distances  from  tiles),  and  hard/soft  bounds 
(holes  having  either  hard  timeouts  or  gradually  decay¬ 
ing  in  value).  There  are  also  variables  we  can  adjust 
in  the  agent:  aci/ihink  rate  (the  relative  speeds  of  act¬ 
ing  and  thinking),  the  filter’s  threshold  level,  and  the 
sophistication  of  the  deliberation  mechanism. 

Experiment  1 

To  begin  with,  we  set  all  of  these  parameters  to  pro¬ 
vide  a  baseline  environment  which  is  dynamic,  vari¬ 
able,  and  moderately  paced.  In  this  environment,  a 
competent  agent  can  achieve  reasonable  scores,  but  is 
penalized  for  wasting  time  or  making  poor  choices.  We 
will  start  by  comparing  the  simple  deliberation  mech¬ 
anism,  based  on  score  value,  with  the  LV  evaluator, 
which  provides  a  better  estimate  of  marginal  utility. 
For  orientation,  we  have  also  included  the  results  of  a 
human  playing  the  role  of  the  agent  in  the  same  sim¬ 
ulation;  and  to  gain  an  idea  of  the  benefit  of  acting  in 
parallel  with  reasoning,  we  have  included  results  for  an 
agent  that  that  acts  and  reasons  serially. 

All  of  these  agents  were  tested  in  the  baseline  en¬ 
vironment  and  in  a  similar  but  more  rapidly  changing 
one.  In  the  faster  environment,  objects  appear  and  dis¬ 
appear  on  the  average  ten  times  more  quickly,  but  the 
agent  can  also  move  ten  times  more  quickly.  However, 
the  agent’s  reasoning  takes  place  at  the  same  rate  of 
speed  as  in  the  baseline  case,  so  the  opportunity  cost  of 
reasoning  is  correspondingly  greater  in  the  faster  en¬ 
vironment.  The  agents  were  all  evaluated  by  taking 
the  average  score  from  30  trials;  the  human  performed 
10.  Each  trial  is  a  self-contained  simulation  with  a  du¬ 
ration  of  5000  ticks  of  the  clock,  where  the  agent  can 
move  once  per  clock  tick. 


Speed 

Score  for  Agent 

Serial 

Simple 

Simple/ 

Serial 

Human 

Normal 

396 

353 

347 

291 

468 

Fast 

256 

234 

183 

152 

3 

Experiment  #1 


The  differences  here  are  quite  apparent.  In  the  nor¬ 
mal  speed  environment,  the  human  subject  performed 
best,  because  he  had  more-sophisticated  planning  ca¬ 
pabilities  than  the  robot.  But  in  the  faster  environ¬ 
ment,  the  human’s  response  speed  was  insufficient  to 
allow  him  to  keep  up  with  the  pace  of  change. 

The  robot  agents  were  better  able  to  adjust  to  the 
more  rapidly  changing  environments,  but  it  is  clear 
that  the  cost  of  reasoning  is  still  significant  for  them. 
This  is  evident  both  from  an  overall  decrease  in  score 
in  the  high-speed  environment,  and  from  the  superi¬ 
ority  of  the  robot  agents  that  could  reason  and  act  in 
parallel. 

The  other  distinction  of  note  is  that  the  LV  evaluator 
performs  better  than  the  simple  evaluator,  as  expected. 

Experiment  2 

We  now  move  on  to  our  initial  experiments  directed 
at  understanding  some  of  the  design  trade-offs  in  our 
agent.  The  use  of  Tileworld  to  experimentally  evalu¬ 
ate  our  agent  architecture  is  an  ongoing  project,  and 
these  are  early  results.  We  stress  that  the  hypothe¬ 
ses  presented  below  are  preliminary;  significantly  more 
experimentation  and  statistical  analysis  of  the  results 
need  to  take  place  before  we  can  make  strong  claims 
about  the  relative  appropriateness  of  any  particular 
agent-design  strategy. 

In  Experiment  2,  we  attempt  to  test  the  usefulness  of 
the  filtering  mechanism  in  our  agent  as  implemented, 
using  the  LV  evaluator  as  the  deliberation  component, 
and  using  the  most  quickly  computed  evaluation  met¬ 
ric,  thresholding  on  the  score  value,  as  the  filter  over¬ 
ride  mechanism.  We  vary  the  threshold  from  -100  to 
100.  Since  the  score  for  each  hole  ranges  from  1  to 
100,  a  threshold  setting  of -100  means  that  every  new 
option  is  subject  to  deliberation,  while  a  setting  of  100 
means  that  no  new  option  will  ever  be  considered  until 
the  currently  e.Xecuting  plan  is  complete.  The  result¬ 
ing  scores  are  summarized  in  the  following  chart,  where 
each  value  represents  an  average  over  30  trials. 


Experiment  #2 

At  the  slowest  speed  setting,  100  times  slower  than 
our  “normal”  setting,  it  is  better  to  do  no  filtering  at 
all.  The  scores  achieved  at  this  speed  decrease  con¬ 
sistently  as  the  threshold  is  increased.  At  the  normal 
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speed  setting,  the  effect  of  increased  filtering  still  ap¬ 
pears  to  be  negative,  but  less  markedly  so.  At  a  setting 
10  times  faster  than  the  normal  one,  there  seems  to  be 
little  correlation  between  threshold  level  and  perfor¬ 
mance,  although  the  uncertainty  in  the  results,  which 
appears  to  be  in  the  range  of  10-20  points,  prevents  a 
sure  determination.  We  hope,  in  the  future,  to  be  able 
to  make  even  these  relatively  subtle  determinations; 
the  noise  in  the  data  comes,  we  believe,  largely  from 
our  decision  to  use  actual  CPU-time  measurements  to 
determine  reasoning  time.  If  we  wish  to  get  the  clean¬ 
est  trials  possible,  we  may  need  to  use  a  time  estimate 
that  does  not  depend  on  the  vagaries  of  the  underlying 
machine  and  Lisp  system.  Failing  that,  we  will  need 
to  model  the  uncertainty  involved,  and  run  larger  trial 
sets. 

To  sum  up  the  results  of  this  experiment,  we  see  that 
filtering  is  harmful  at  slow  speeds,  and  even  at  high 
speeds  does  not  give  a  net  benefit.  Our  hypothesis  is 
that  the  time  cost  of  the  LV  evaluator  is  not  very  high, 
and  consequently,  it  is  usually  worth  taking  the  time  to 
engage  in  extra  deliberation  about  new  opportunities. 
The  fact  that  filtering  is  less  detrimental  in  the  faster 
environment  leads  us  to  hypothesize  that  there  may  be 
a  break-even  point  at  even  faster  speeds,  above  which 
filtering  is  useful;  we  intend  to  test  for  such  a  point.  We 
also  intend  to  implement  more  accurate  (and  costly) 
deliberation  mechanisms  in  the  near  future.  For  these, 
filtering  may  be  much  more  valuable;  perhaps  the  L^’- 
estimator  is  efficient  enough  that  it  can  itself  be  used 
as  the  filter  override  mechanism  for  the  more  complex 
deliberation  components. 

Experiment  3 

In  our  third  experiment,  we  attempt  to  test  a  conjec¬ 
ture  that  the  LV  evaluator  as  described  is  deficient  in 
an  important  way:  it  does  not  consider  the  time  cost 
of  means-end  reasoning  already  performed.  We  modify 
the  deliberation  functions  by  adding  a  bias  in  favor  of 
existing  intentions,  since  typically  at  deliberation  time, 
some  means-end  reasoning  about  how  to  achieve  these 
has  already  taken  place.  This  is  distinct  from  Experi¬ 
ment  2,  in  which  we  adjusted  the  fiHering  mechanism 
in  an  attempt  to  save  deliberation  time;  here  we  inves¬ 
tigate  a  bias  in  the  deliberation  process  itself  with  the 
intent  of  reducing  the  time  cost  of  means-end  reason¬ 
ing. 

We  consider  two  cases.  In  the  first,  deliberation  is 
done  by  the  simple  evaluator,  and  we  apply  a  bias  to¬ 
wards  existing  intentions  equal  to  a  fixed  number  of 
points.  In  the  second,  deliberation  is  done  by  the  LV 
evaluator,  and  we  apply  a  bias  equal  to  a  fraction  of 
the  current  LV.  Thus,  for  example,  with  a  100  percent 
bias,  a  newly  appearing  hole  must  have  double  the  LV 
of  the  current  one  to  be  adopted  as  a  new  intention. 
The  environment  settings  and  simulation  sizes  are  the 
same  as  for  Experiment  2. 


Bias  (points) 

Experiment  #3:  Simple  Evaluator 


Experiment  #3:  LV  Evaluator 


As  shown  by  the  experimental  results,  bias  in  the  de- 
liberator  does  not  appear  to  have  a  clear  effect  on  total 
performance.  For  the  simple  evaluator,  this  isn’t  terri¬ 
bly  surprising;  it  provides  a  fairly  weak  assessment  of  a 
hole’s  actual  potential  value  in  any  case.  We  expected 
to  see  much  more  effect  of  bias  on  the  LV  evaluator, 
however.  Two  hypotheses  are  available  to  explain  this. 
First,  our  test  environment  may  have  loo  many  oppor¬ 
tunities  available,  minimizing  the  potential  cost  of  high 
bias:  if  the  agent  spends  most  of  its  time  doing  some¬ 
thing  with  high  utility,  a  few  missed  opportunities  will 
not  have  a  significant  impact  on  the  final  score.  This 
hypothesis  can  be  tested  in  a  less  favorable  environ¬ 
ment.  Second,  it  may  be  that  means-end  reasoning 
in  the  current  implementation  is  too  inexpensive,  min¬ 
imizing  the  potential  benefit  of  high  bias.  This  hy¬ 
pothesis  can  be  tested  by  increasing  the  size  of  the 
environment  to  increase  the  planning  time  required; 
the  addition  of  more  complex  planning  routines  would 
also  provide  situations  in  which  there  is  a  higher  lime 
cost  associated  with  planning. 

Conclusion 

The  experiments  we  have  run  to  date  have  included 
some  important  milestones  in  the  Tileworld  effort.  The 
Tileworld  domain  has  been  demonstrated,  and  ha-s 
been  shown  to  be  a  viable  system  for  evaluating  agent 
architectures.  The  Tileworld  agent  was  demonstrated 
and  used  to  test  differing  deliberation  and  filtering 
strategies  as  described  in  [Bratman  et  a/.,  1988]. 

The  Tileworld  project  is  ongoing.  There  are  a  num¬ 
ber  of  specific  research  tasks  that  we  intend  to  pur¬ 
sue  in  the  near  future.  Perhaps  most  importantly,  we 
will  be  continuing  our  experimental  efforts.  The  hy¬ 
potheses  we  drew  from  our  preliminary  experiments 
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suggested  several  obvious  follow-ons,  as  described  in 
the  preceding  section.  It  will  be  particularly  useful  to 
vary  parameters  other  than  those  that  control  speed, 
for  example,  size  of  the  overall  space,  distribution  of 
task  value  and  difficulty,  and  availability  of  limited  re¬ 
sources  such  as  tiles. 

We  will  also  implement  more  sophisticated  deliber¬ 
ation  algorithms,  and,  having  done  so,  will  attempt  to 
identify  better  the  principles  separating  the  processing 
that  is  done  in  the  filtering  mechanism  from  lliat  done 
in  the  deliberation  procedure.  In  addition,  we  plan  to 
implement  a  foveated  perceptual  scheme,  in  which  lire 
agent  has  access  to  detailed,  precise  information  about 
its  immediate  surroundings  and  has  only  increasingly 
abstract,  incomplete,  and  uncertain  information  about 
about  more  distant  locations  in  its  environment.  An¬ 
other  possibility  is  to  add  learning  to  the  system:  two 
areas  of  potential  benefit  are  in  the  means-ends  rea- 
soner  (e.g.,  explanation-based  learning  of  control  rules) 
and  in  evaluations  of  marginal  utility  (e.g.,  empiri¬ 
cal  improvement  of  utility  evaluations).  Finally,  we 
hope  to  extend  the  architecture  to  handle  more  difficult 
questions  involving  intention  coordination.  We  expect 
that  both  means-end  reasoning  and  deliberation  will 
become  much  more  difficult,  and  hence  fillering  much 
more  important,  when  the  intention  structure  involves 
more  complex  interactions  among  intentions. 

More  generally,  we  continue  to  investigate  the  larger 
question  of  how  an  agent  should  structure  and  control 
its  computational  effort.  We  believe  that  the  architec¬ 
ture  discussed  here  is  a  special  case  of  a  more  general 
framework,  and  we  are  working  towards  a  definition  of 
that  framework  and  its  verification  in  our  domain.  We 
also  see  the  Tileworld  testbed  as  a  good  basis  for  com¬ 
parison  of  other  agent  architectures  proposed  in  the 
literature,  and  we  strongly  encourage  other  researchers 
to  demonstrate  their  agents  in  our  domain.^ 

The  overall  goal  of  our  project  is  an  improved  un¬ 
derstanding  of  the  relation  between  agent  design  and 
environmental  factors.  In  the  future,  when  faced  with 
a  performance  domain  for  an  agent,  one  should  be 
able  to  draw  on  such  an  understanding  to  choose  more 
wisely  from  the  wide  range  of  implementation  possibil¬ 
ities  available. 
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